space invader
Attention Trajectories as a Diagnostic Axis for Deep Reinforcement Learning
Beylier, Charlotte, Selder, Hannah, Fleig, Arthur, Hofmann, Simon M., Scherf, Nico
While deep reinforcement learning agents demonstrate high performance across domains, their internal decision processes remain difficult to interp ret when evaluated only through performance metrics. In particular, it is poorly understoo d which input features agents rely on, how these dependencies evolve during training, and how t hey relate to behavior. We introduce a scientific methodology for analyzing the learni ng process through quantitative analysis of saliency. This approach aggregates saliency in formation at the object and modality level into hierarchical attention profiles, quantifyin g how agents allocate attention over time, thereby forming attention trajectories throughout t raining. Applied to Atari benchmarks, custom Pong environments, and muscle-actuated biom echanical user simulations in visuomotor interactive tasks, this methodology uncovers a lgorithm-specific attention biases, reveals unintended reward-driven strategies, and diagnos es overfitting to redundant sensory channels. These patterns correspond to measurable behavio ral differences, demonstrating empirical links between attention profiles, learning dynam ics, and agent behavior. To assess robustness of the attention profiles, we validate our finding s across multiple saliency methods and environments. The results establish attention traj ectories as a promising diagnostic axis for tracing how feature reliance develops during train ing and for identifying biases and vulnerabilities invisible to performance metrics alone.
- Europe > Germany > Saxony > Leipzig (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
- (2 more...)
- Health & Medicine (1.00)
- Leisure & Entertainment > Games (0.94)
- North America > United States (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > France (0.04)
- (2 more...)
- Law (0.46)
- Information Technology > Security & Privacy (0.46)
Learning Game-Playing Agents with Generative Code Optimization
Kuang, Zhiyi, Rong, Ryan, Yuan, YuCheng, Nie, Allen
We present a generative optimization approach for learning game-playing agents, where policies are represented as Python programs and refined using large language models (LLMs). Our method treats decision-making policies as self-evolving code, with current observation as input and an in-game action as output, enabling agents to self-improve through execution traces and natural language feedback with minimal human intervention. Applied to Atari games, our game-playing Python program achieves performance competitive with deep reinforcement learning (RL) baselines while using significantly less training time and much fewer environment interactions. This work highlights the promise of programmatic policy representations for building efficient, adaptable agents capable of complex, long-horizon reasoning.
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > Canada (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Portugal > Braga > Braga (0.04)
- North America > United States > New York > Richmond County > New York City (0.04)
- North America > United States > New York > Queens County > New York City (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (6 more...)
- Law (0.46)
- Information Technology (0.46)
Playing Atari Space Invaders with Sparse Cosine Optimized Policy Evolution
O'Connor, Jim, Nash, Jay B., Gezgin, Derin, Parker, Gary B.
Evolutionary approaches have previously been shown to be effective learning methods for a diverse set of domains. However, the domain of game-playing poses a particular challenge for evolutionary methods due to the inherently large state space of video games. As the size of the input state expands, the size of the policy must also increase in order to effectively learn the temporal patterns in the game space. Consequently, a larger policy must contain more trainable parameters, exponentially increasing the size of the search space. Any increase in search space is highly problematic for evolutionary methods, as increasing the number of trainable parameters is inversely correlated with convergence speed. To reduce the size of the input space while maintaining a meaningful representation of the original space, we introduce Sparse Cosine Optimized Policy Evolution (SCOPE). SCOPE utilizes the Discrete Cosine Transform (DCT) as a pseudo attention mechanism, transforming an input state into a coefficient matrix. By truncating and applying sparsification to this matrix, we reduce the dimensionality of the input space while retaining the highest energy features of the original input. We demonstrate the effectiveness of SCOPE as the policy for the Atari game Space Invaders. In this task, SCOPE with CMA-ES outperforms evolutionary methods that consider an unmodified input state, such as OpenAI-ES and HyperNEAT. SCOPE also outperforms simple reinforcement learning methods, such as DQN and A3C. SCOPE achieves this result through reducing the input size by 53% from 33,600 to 15,625 then using a bilinear affine mapping of sparse DCT coefficients to policy actions learned by the CMA-ES algorithm.
- North America > United States > Connecticut > New London County > New London (0.04)
- North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)
Space Invaders on your wrist: the glory years of Casio video game watches
Over the last couple of weeks I have been tidying our attic, and while the general aim has been to prevent its contents from collapsing through the ceiling, I have a side-mission. My most valued possession when I was twelve was a Casio GD-8 Car Race watch – a digital timepiece that included a built-in racing game on its tiny monochrome LCD display. Two big buttons on the front let you steer left and right to avoid incoming vehicles and your aim was to stay alive as long as possible. I lost count of the number of times it was confiscated by teachers at my school. I used to lend it to the hardest boys in the year, thereby guaranteeing me protection against bullies.
- Information Technology > Hardware (0.52)
- Information Technology > Artificial Intelligence > Games (0.51)
Pixel to policy: DQN Encoders for within & cross-game reinforcement learning
Agrawal, Ashrya, Shah, Priyanshi, Prakash, Sourabh
Reinforcement Learning can be applied to various tasks, and environments. Many of these environments have a similar shared structure, which can be exploited to improve RL performance on other tasks. Transfer learning can be used to take advantage of this shared structure, by learning policies that are transferable across different tasks and environments and can lead to more efficient learning as well as improved performance on a wide range of tasks. This work explores as well as compares the performance between RL models being trained from the scratch and on different approaches of transfer learning. Additionally, the study explores the performance of a model trained on multiple game environments, with the goal of developing a universal game-playing agent as well as transfer learning a pre-trained encoder using DQN, and training it on the same game or a different game. Our DQN model achieves a mean episode reward of 46.16 which even beats the human-level performance with merely 20k episodes which is significantly lower than deepmind's 1M episodes. The achieved mean rewards of 533.42 and 402.17 on the Assault and Space Invader environments respectively, represent noteworthy performance on these challenging environments.
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > United States > California > San Diego County > La Jolla (0.04)
Probing Transfer in Deep Reinforcement Learning without Task Engineering
Rusu, Andrei A., Flennerhag, Sebastian, Rao, Dushyant, Pascanu, Razvan, Hadsell, Raia
We evaluate the use of original game curricula supported by the Atari 2600 console as a heterogeneous transfer benchmark for deep reinforcement learning agents. Game designers created curricula using combinations of several discrete modifications to the basic versions of games such as Space Invaders, Breakout and Freeway, making them progressively more challenging for human players. By formally organising these modifications into several factors of variation, we are able to show that Analyses of Variance (ANOVA) are a potent tool for studying the effects of human-relevant domain changes on the learning and transfer performance of a deep reinforcement learning agent. Since no manual task engineering is needed on our part, leveraging the original multi-factorial design avoids the pitfalls of unintentionally biasing the experimental setup. We find that game design factors have a large and statistically significant impact on an agent's ability to learn, and so do their combinatorial interactions. Furthermore, we show that zero-shot transfer from the basic games to their respective variations is possible, but the variance in performance is also largely explained by interactions between factors. As such, we argue that Atari game curricula offer a challenging benchmark for transfer learning in RL, that can help the community better understand the generalisation capabilities of RL agents along dimensions which meaningfully impact human generalisation performance. As a start, we report that value-function finetuning of regularly trained agents achieves positive transfer in a majority of cases, but significant headroom for algorithmic innovation remains. We conclude with the observation that selective transfer from multiple variants could further improve performance.
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
- North America > United States > Washington > King County > Bellevue (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (3 more...)
Measuring Interventional Robustness in Reinforcement Learning
Avery, Katherine, Kenney, Jack, Amaranath, Pracheta, Cai, Erica, Jensen, David
Recent work in reinforcement learning has focused on several characteristics of learned policies that go beyond maximizing reward. These properties include fairness, explainability, generalization, and robustness. In this paper, we define interventional robustness (IR), a measure of how much variability is introduced into learned policies by incidental aspects of the training procedure, such as the order of training data or the particular exploratory actions taken by agents. A training procedure has high IR when the agents it produces take very similar actions under intervention, despite variation in these incidental aspects of the training procedure. We develop an intuitive, quantitative measure of IR and calculate it for eight algorithms in three Atari environments across dozens of interventions and states. From these experiments, we find that IR varies with the amount of training and type of algorithm and that high performance does not imply high IR, as one might expect.
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)
Investigation of Independent Reinforcement Learning Algorithms in Multi-Agent Environments
Lee, Ken Ming, Subramanian, Sriram Ganapathi, Crowley, Mark
Independent reinforcement learning algorithms have no theoretical guarantees for finding the best policy in multi-agent settings. However, in practice, prior works have reported good performance with independent algorithms in some domains and bad performance in others. Moreover, a comprehensive study of the strengths and weaknesses of independent algorithms is lacking in the literature. In this paper, we carry out an empirical comparison of the performance of independent algorithms on four PettingZoo environments that span the three main categories of multi-agent environments, i.e., cooperative, competitive, and mixed. We show that in fully-observable environments, independent algorithms can perform on par with multi-agent algorithms in cooperative and competitive settings. For the mixed environments, we show that agents trained via independent algorithms learn to perform well individually, but fail to learn to cooperate with allies and compete with enemies. We also show that adding recurrence improves the learning of independent algorithms in cooperative partially observable environments.
- North America > Canada > Ontario > Waterloo Region > Waterloo (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Research Report (0.64)
- Overview (0.46)